Skip to content

Prototype batching lock intent for skip locked#31344

Draft
emilienoel wants to merge 1 commit into
yugabyte:masterfrom
Shopify:prototype_batching_intent_SKIP_LOCKED
Draft

Prototype batching lock intent for skip locked#31344
emilienoel wants to merge 1 commit into
yugabyte:masterfrom
Shopify:prototype_batching_intent_SKIP_LOCKED

Conversation

@emilienoel
Copy link
Copy Markdown

@emilienoel emilienoel commented Apr 29, 2026

Batched SKIP LOCKED Walkthrough

This document walks through the rebased change and identifies whether each step happens in the YSQL layer or on the tserver side.

Definitions:

  • YSQL: PostgreSQL/YSQL layer, including executor and PgGate client-side API.
  • tserver: tablet server / DocDB side.

1. Enable batching via GUC

Where: YSQL

Files:

src/postgres/src/backend/utils/misc/guc.c
src/postgres/src/include/utils/guc.h

Adds:

yb_skip_locked_batch_size

Default:

32

Meaning:

  • 1 disables the optimization.
  • >1 allows YSQL executor to prefetch multiple candidate rows for SKIP LOCKED.

2. Detect eligible SKIP LOCKED query

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

In ExecLockRows, the batch path is used only when:

node->yb_are_row_marks_for_yb_rels &&
yb_skip_locked_batch_size > 1 &&
list_length(node->lr_arowMarks) == 1 &&
erm->waitPolicy == LockWaitSkip

So YSQL decides whether to use the optimization.

This means the query must be a YB relation with one row mark and SKIP LOCKED.


3. Prefetch candidate rows from the child plan

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

Function:

ExecLockRowsBatchSkipLocked(...)

YSQL pulls up to yb_skip_locked_batch_size rows from the child plan:

slot = ExecProcNode(outerPlan);

For each candidate row, YSQL extracts the ybctid:

datum = ExecGetJunkAttribute(slot, aerm->ctidAttNo, &isNull);

Then stores:

batch_ybctids[batch_count] = datumCopy(...);
batch_tuples[batch_count] = ExecCopySlotHeapTuple(slot);

At this point, YSQL has a local batch of candidate rows.


4. Send candidate ybctids to PgGate

Where: YSQL

File:

src/postgres/src/backend/access/yb_access/yb_scan.c

Function:

YBCLockTupleBatch(...)

YSQL creates a new select statement:

YbcPgStatement ybc_stmt = YbNewSelect(...);

Then it adds each candidate ybctid:

YBCPgDmlAddBatchYbctidArg(ybc_stmt, data, len);

This is still YSQL-side code preparing the request.


5. PgGate appends batch arguments to the read request

Where: YSQL / PgGate client side

Files:

src/yb/yql/pggate/ybc_pggate.cc
src/yb/yql/pggate/pggate.cc
src/yb/yql/pggate/pg_dml_read.cc

Call chain:

YBCPgDmlAddBatchYbctidArg(...)

calls:

PgApiImpl::DmlAddBatchYbctidArg(...)

which calls:

PgDmlRead::AddBatchYbctidArg(...)

That appends each candidate to:

read_req_->add_batch_arguments();

So the request sent to the tserver contains:

batch_arguments = [ybctid0, ybctid1, ybctid2, ...]

This is PgGate building the tserver request, but it is on the YSQL side of the boundary.


6. Execute/fetch the YSQL statement, causing RPC to tserver

Where: starts in YSQL, crosses to tserver

File:

src/postgres/src/backend/access/yb_access/yb_scan.c

YSQL calls:

YBCPgExecSelect(ybc_stmt, &exec_params);
YBCPgDmlFetch(ybc_stmt, 0, NULL, NULL, NULL, &has_data);

The fetch is what forces the PgGate operation to perform the remote read RPC.

Boundary:

YSQL/PgGate  --->  tserver

7. tserver detects special batch SKIP LOCKED request

Where: tserver

File:

src/yb/tserver/read_query.cc

In ReadQuery::DoPerform, tserver checks:

has_row_mark &&
!serializable_isolation &&
req_->pgsql_batch_size() == 1 &&
pgsql_read.wait_policy() == WAIT_SKIP &&
pgsql_read.batch_arguments_size() > 1

If true, tserver chooses the new path:

TryLockBatchArg(0, ...);

So the tserver, not YSQL, decides how to process the batched lock request internally.


8. tserver tries to lock one batch argument at a time

Where: tserver

File:

src/yb/tserver/read_query.cc

Function:

ReadQuery::TryLockBatchArg(...)

For each candidate index, tserver builds a write operation that creates read-lock intents for only that candidate.

It calls:

tablet_ptr->CreateReadIntentForBatchArg(
    isolation_level,
    pgsql_read,
    batch_arg_index,
    &write_batch);

Then it submits:

peer->WriteAsync(std::move(query));

This is tserver-side async write/conflict resolution.


9. Tablet builds intents for exactly one candidate

Where: tserver

Files:

src/yb/tablet/tablet.cc
src/yb/tablet/tablet.h
src/yb/docdb/pgsql_operation.cc
src/yb/docdb/pgsql_operation.h

Call chain:

Tablet::CreateReadIntentForBatchArg(...)

calls:

docdb::GetIntentsForBatchArg(...)

That function picks one candidate:

const auto& batch_argument = request.batch_arguments(batch_arg_index);

and creates lock intents only for that candidate.

This is important: tserver does not lock all batch arguments at once.


10. tserver handles lock success or conflict

Where: tserver

File:

src/yb/tserver/read_query.cc

In the callback from WriteAsync:

If the lock succeeds

self->first_locked_batch_arg_index_ = batch_arg_index;
peer->Enqueue(self.get());

The tserver records the winning index and proceeds to read the row.

If the lock conflicts / transaction error occurs

TransactionError(status).value() != TransactionErrorCode::kNone

Then because this is SKIP LOCKED, tserver skips this candidate and tries the next one:

self->TryLockBatchArg(batch_arg_index + 1, ...);

If all candidates conflict

Eventually:

batch_arg_index >= total_batch_args

Then tserver sets:

first_locked_batch_arg_index_ = -1;

and proceeds to read phase with no winner.


11. tserver reads only the winning candidate

Where: tserver

File:

src/yb/tserver/read_query.cc

Function:

ReadQuery::DoReadImpl()

The original request still contains all candidates:

batch_arguments = [row0, row1, row2, row3]

But after locking, tserver creates a modified effective request.

If there is a winner:

modified_req->clear_batch_arguments();
*modified_req->add_batch_arguments() = batch_argument_for_winner;
effective_req = modified_req;

So the actual read phase reads only the winning row.

If no candidate was locked:

modified_req->clear_batch_arguments();
effective_req = modified_req;

So the read returns zero rows.


12. tserver populates response metadata

Where: tserver

File:

src/yb/tserver/read_query.cc

After reading, tserver sets:

result.response->set_batch_arg_count(pgsql_read_req.batch_arguments_size());

That tells PgGate:

All batch arguments were consumed/tried.

If there was a winner, tserver also sets:

result.response->set_first_locked_batch_arg_index(first_locked_batch_arg_index_);

This field was added in:

src/yb/common/pgsql_protocol.proto

as:

optional int32 first_locked_batch_arg_index = 22;

So the tserver response carries the winning candidate index back to YSQL.

Boundary:

tserver  --->  YSQL/PgGate

13. PgGate reads the winner index from response

Where: YSQL / PgGate client side

Files:

src/yb/yql/pggate/pg_doc_op.cc
src/yb/yql/pggate/pg_dml_read.cc
src/yb/yql/pggate/pggate.cc
src/yb/yql/pggate/ybc_pggate.cc

Call chain:

YBCPgDmlGetFirstLockedBatchArgIndex(...)

calls:

PgApiImpl::DmlGetFirstLockedBatchArgIndex(...)

then:

PgDmlRead::GetFirstLockedBatchArgIndex()

then:

PgDocReadOp::GetFirstLockedBatchArgIndex()

which reads:

resp->first_locked_batch_arg_index()

If absent, it returns:

-1

14. YSQL maps the winner index to a PostgreSQL lock result

Where: YSQL

File:

src/postgres/src/backend/access/yb_access/yb_scan.c

Back in:

YBCLockTupleBatch(...)

YSQL gets:

winner

Then:

if (winner >= 0 && winner < count)
{
    *locked_index = winner;
    res = TM_Ok;
}
else
{
    *locked_index = -1;
    res = TM_WouldBlock;
}

So YSQL converts tserver's response into PostgreSQL executor semantics:

  • winner found: TM_Ok
  • no winner: TM_WouldBlock

15. YSQL returns the winning tuple

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

Back in:

ExecLockRowsBatchSkipLocked(...)

If the batch lock returned TM_Ok, YSQL restores the winning tuple into the result slot:

ExecForceStoreHeapTuple(batch_tuples[locked_index], result_slot, true);

Then it returns that tuple to the upper executor nodes.

So if the query was:

SELECT * FROM jobs FOR UPDATE SKIP LOCKED LIMIT 1;

this is the point where the selected row goes back up the executor tree.


16. YSQL saves candidates after the winner as leftovers

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

Suppose the batch was:

[row0, row1, row2, row3]

and tserver locked:

row1

Then:

  • row0 was tried and skipped.
  • row1 is returned.
  • row2, row3 were prefetched but not tried.

YSQL stores candidates after the winner:

node->yb_batch_leftover_tuples
node->yb_batch_leftover_ybctids

These are used on later calls to ExecLockRows.


17. YSQL tries leftovers before scanning more rows

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

At the top of ExecLockRows, before fetching a new row from the child plan, YSQL checks:

if (node->yb_batch_leftover_count > node->yb_batch_leftover_idx)

If leftovers exist, it calls:

ExecLockRowsTryLeftover(...)

That tries each leftover using the existing single-row path:

YBCLockTuple(...)

The leftover entries themselves do not store a table id or relation key. They only store parallel
arrays of:

ybctid
tuple

The table is recovered from the same LockRowsState row mark:

ExecAuxRowMark *aerm = (ExecAuxRowMark *) linitial(node->lr_arowMarks);
ExecRowMark *erm = aerm->rowmark;

and the lock call uses:

YBCLockTuple(erm->relation, ybctid, erm->markType, LockWaitSkip, estate);

This is safe because the batch optimization is only enabled when:

list_length(node->lr_arowMarks) == 1

So every prefetched candidate and every leftover belongs to the single row-marked YB relation for
this LockRowsState. For multi-row-mark queries, such as joins with multiple locked tables, the
batch path is not used because a bare leftover ybctid would not be enough to identify which
relation to lock.

So after the first batch winner, later prefetched-but-untried rows are not lost.


18. YSQL cleans up leftovers

Where: YSQL

File:

src/postgres/src/backend/executor/nodeLockRows.c

In:

ExecEndLockRows(...)

YSQL frees any unconsumed leftover tuples and ybctids.


Compact end-to-end sequence

Step Layer What happens
1 YSQL GUC yb_skip_locked_batch_size controls batch size
2 YSQL ExecLockRows detects eligible FOR UPDATE SKIP LOCKED
3 YSQL Executor prefetches candidate rows and extracts ybctids
4 YSQL YBCLockTupleBatch builds a select request
5 YSQL/PgGate PgGate adds ybctids as batch_arguments
6 YSQL -> tserver Fetch triggers RPC
7 tserver ReadQuery::DoPerform detects batched SKIP LOCKED
8 tserver TryLockBatchArg tries candidate 0
9 tserver Tablet/DocDB creates intent for only that candidate
10 tserver On conflict, try next candidate; on success, record winner
11 tserver Read phase reads only winning candidate
12 tserver Response includes first_locked_batch_arg_index
13 tserver -> YSQL Response returns to PgGate
14 YSQL/PgGate PgGate exposes winner index
15 YSQL YBCLockTupleBatch maps winner to TM_Ok / TM_WouldBlock
16 YSQL Executor returns winning tuple
17 YSQL Executor saves later prefetched candidates as leftovers
18 YSQL Future calls try leftovers before scanning more rows

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 29, 2026

Deploy Preview for infallible-bardeen-164bc9 ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 4b696d1
🔍 Latest deploy log https://app.netlify.com/projects/infallible-bardeen-164bc9/deploys/69f21c840d90900008bb7a04
😎 Deploy Preview https://deploy-preview-31344--infallible-bardeen-164bc9.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants